home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
The PC-SIG Library 10
/
The PC-Sig Library - Shareware for the IBM PC and Compatibles (PC-SIG)(Tenth Edition Disks 1-2804)(1991).iso
/
PC_SIGCD
/
02
/
8
/
DISK0283.ZIP
/
KWSEARCH.DOC
< prev
next >
Wrap
Text File
|
1985-01-31
|
13KB
|
364 lines
++++++++++++++++++++++++++++++++++++++++++++++++++++
DOCUMENTATION FOR KWSEARCH version 1.3, January 1985
++++++++++++++++++++++++++++++++++++++++++++++++++++
If you have a printer, you may prefer to print this documentation
instead of viewing it on the screen. If you are using IBM-DOS,
you can print this file by returning to DOS, and then typing the
command,
COPY KWSEARCH.DOC LPT1:
Alternatively, if you have an ASCII compatible word processor,
you could use it to format and print this file.
Before You Use KWSEARCH
-----------------------
You should have a diskette with your "system" in drive A while
running this program. This can be accomplished by formatting a
new diskette with the /s option (see your DOS manual), copying
KWSEARCH.EXE to that diskette, and then using it in drive A.
NOTE: The symbol which you see at the end of paragraphs if you
view this file on your screen is an unprintable word
processor control character.
What KWSEARCH Does
------------------
KWSEARCH reads text from one or more "source files", and
transfers those "records" matching your "search criteria" to a
"destination file". A "record" can be thought of as simply a
paragraph.
Source File Destination File
+------------+ +--------------+
| | | |
| "Records" | | Records |
| | | Matching |
| | -----> KWSEARCH -----> | Search |
| | | Criteria |
| | | |
+------------+ +--------------+
Advantages of KWSEARCH
----------------------
KWSEARCH searches regular sequential text files for records and
key words. No key or index files are used and no "importing" of
text files is necessary before performing a search, as long as
the file is in ASCII format. This allows efficient use of disk
space and lets you conduct searches on files that would not
otherwise be useable as databases.
Source Files
------------
The source files contain the text which you wish to search. They
can be any regular text files in ASCII format, created with your
word processor, downloaded through a modem, or generated by
another program. Many word processing programs store files in
this format, and those which do not generally have utility
programs for conversion into ASCII format.
To determine if a file is in this format, you could use the DOS
command, TYPE filename. If the resulting screen output consists
of normal, legible text, then the file is an ASCII file. The
file you are reading now is an ASCII file.
You can search up to 10 source files during a single run. Simply
enter the names of the files when prompted. You can use the disk
drive prefixes A:, B:, etc. to specify files but subdirectory and
path parameters (.. and \) are not recognized by this version of
KWSEARCH. Therefore, if you have files organized into
subdirectories, all referenced files must be in the default
subdirectory of each drive.
Records
-------
A record is the basic unit of related information read by
KWSEARCH from a source file. Most database management programs
break a record down into fields, and require that a certain
amount of disk space be set aside for each field, whether that
space is used or not. Some programs have pre-defined limits on
field structure and field size. Others require that you define
the fields and their sizes for each database.
KWSEARCH is much simpler. It recognizes a record as a series of
up to 50 lines of text, separated from other records by a blank
line. Therefore a record is like a paragraph. In fact, you can
use KWSEARCH to search for paragraphs in a document as if they
were records in a database.
Key word lines are the only part of a record that are analogous
to fields, and they are optional. However, restricting a search
to words on key word lines makes the search run faster. In
either case, if a records' key words match the search criteria,
the entire record will be copied to the destination file, up to a
maximum of 50 lines per record.
If a record contains more than 50 lines, or more than 600 key
words, then the record is broken at that point and subsequent
lines are treated like another record.
The default definition of a key word line is any line beginning
with the characters "KW:". There can be up to 10 key word lines
in a record. All words on the key word lines in a record will be
compared with the search criteria to determine if the record is
to be saved.
You can redefine the identifier for your key word lines (e.g.
lines beginning with KEY:) or you can search the full text.
For this option, records must contain no more than 600 words
each.
Search Criteria
---------------
The search criteria describe the conditions which must be met by
a record in order for it to be saved. It consists of three types
of information,
1) up to 10 key words,
2) a combination of those key words using the operators AND, OR,
and NOT,
3) instructions as to which lines in a record which will be used
as key word lines.
The last "search set" defined will be the one used as the basis
for a search, although search statistics will be listed for all
other key words and search sets.
The best method of defining search criteria depends on the nature
of your source files and on your information retrieval
requirements. An example of a search for a recipe is used for
illustration.
Here are two sample records...
Guacamole
---------
2 medium avocados
2 T. lemon juice
1/4 t. pepper
1/2 small onion, chopped
1 clove garlic, minced
1/2 t. salt
...
Mix in a blender until smooth. Makes 1 and 1/4 cups.
KW: GUACAMOLE, AVOCADOS, GARLIC, ONION, MEXICAN
Enchilada Sauce
---------------
1 pound fresh ripe tomatoes
1 large red bell pepper
2 med. red onions
2 large cloves garlic
1/2 t salt
1/2 t crushed red pepper
1/4 t cumin
1/4 t black pepper
...
Cut tomatoes, peppers, and onion into small chunks. Combine in a
blender and puree.
KW: ENCHILADA SAUCE, TOMATOES, ONIONS, GARLIC, MEXICAN
Now, suppose you want to search for all recipes that contain the
key words AVOCADO or TOMATO. Run KWSEARCH, specify AVOCADO and
TOMATO as key words, then combine those key words with OR.
Lastly, specify the names of the source and a destination files.
When the search begins, the following will occur:
KWSEARCH will read the guacamole recipe and will recognize
the words on the line beginning with "KW:" as the record's
key words. It will then compare those key words with the
search criteria. Since they meet the search criteria, the
recipe will be copied to the destination file. Then the
next recipe will be saved as well since its key words also
match the search criteria.
You have the option to print descriptive information in the
destination file along with the records themselves. Descriptive
information includes the date, starting time of the search, names
of source files, search criteria, and finishing time.
In version 1.3 of KWSEARCH, the following rules apply:
1. Key words can be either upper or lower case in the source
files and you can enter the search criteria in either upper
or lower case.
2. Key words in a record are truncated to the length of the
search criterion word before a comparison is made. Therefore,
if you are searching for AVOCADO, you will find AVOCADOES. It
is recommended that you specify the roots of words as search
criteria, for example, use GEO in order to be able to detect
both GEOLOGY and GEOCHEMISTRY. Note that comparisons are made
in a left justified field based on the length of the search
criterion word.
The search criterion word, GEO, is contained in each of the
following source file words:
GEOLOGY,
and GEOCHEMISTRY
However, if the search criterion word is CHEMISTRY, then
GEOCHEMISTRY does not match.
There are two types of "errors" that can occur when conducting a
search:
1) failure to save a record that you would like to have saved,
and
2) saving a record that was not really desired.
Errors 1 and 2 can be minimized by carefully selecting the key
words for defining records and for conducting a search. Error 1
can also be reduced by searching the full text instead of just
key word lines although this results in slower program
execution.
The approach you use to minimize these errors also depends on
the time required for a search, and the time required to manually
edit the destination file.
Careful assignment of key words can allow you to conduct
effective searches that are moderately fast. A way to ensure
that a desired record is found is to use consisistency or
redundancy when assigning key words to a record. For example,
you can assign the words GEO and CHEMISTRY, or GEOCHEMISTRY and
CHEMISTRY instead of the single word GEOCHEMISTRY. Similarly,
you could specify (GEO OR CHEM) as part of the search criteria.
If execution time is a factor, it is more efficient to limit your
search to words on key word lines and specify more words in the
search criteria, than to search the full text. However, it may
be desirable to search the full text in order to make sure that
no records are missed, or it may be necessary to do so if special
key word lines are not present in the source file.
One tactic for effective searching is to divide it into stages.
In the first stage, conduct an "inflated" search to make sure all
possible records of interest are saved. This can be accomplished
by specifying lots of key words and combining them with OR's.
You can then browse through this file before using it as the
source file for more selective searches.
The sequence of defining search criteria might be as follows:
Key Word 1 = ? <AVOCADO>
Key Word 2 = ? <TOMATO>
Key Word 3 = ? <ONIONS>
Key Word 4 = ? <GARLIC>
Are the search words correct? (Y or N) <y>
SET 5 = ? <1> ...1 represents AVOCADO.
SET 5 = AVOCADO ? <o> ...o represents OR.
SET 5 = AVOCADO OR ? <2> ...2 represents TOMATO.
SET 5 = (AVOCADO OR TOMATO) ... Set 5 is now defined.
SET 6 = ? <3> ...3 represents ONIONS.
SET 6 = ONIONS ? <o> ...o represents OR.
SET 6 = ONIONS OR ? <4> ...4 represents GARLIC.
SET 6 = (ONIONS OR GARLIC) ... Set 6 is now defined
The symbols "<>" indicate that the enclosed character or word is
typed, and then the enter key is pressed. Note that a prompt at
the bottom of the screen tells you what to do.
The search critera used in the search will be those defined by
the last search set. When the search is completed, the number of
occurrences of each word and each search set are tabulated on the
screen. You might find it useful to define search sets that are
not actually part of the final search criteria. You can use this
to find out how many records meet various search criteria without
necessarily saving those records.
With practice, you will find that you can construct a sorted
destination file by running more than one search. For example,
you could search for records containing
AVOCADO AND (ONIONS OR GARLIC),
then conduct another search for
AVOCADO AND NOT(ONIONS OR GARLIC).
Note that the two above sets are mutually exclusive, that is they
do not share any records in common. In combination, they include
all recipes with AVOCADO. By appending the second search to the
end of the first search, you would have a file with avocado
recipes sorted according to whether or not they contained onions
or garlic.
Destination File
----------------
Any valid text file name can be specified as the destination
file, with the exception of subdirectories and paths. For faster
program execution, it is recommended that the destination file be
on a different disk drive than the source files.
For Fastest Performance
-----------------------
If you have a RAM disk, conduct your search from source files in
the RAM disk.
Improvements to KWSEARCH
------------------------
KWSEARCH is being redesigned for easier use and higher
performance. A future version will provide more detailed search
statistics. Any comments you may have will be appreciated and
will be considered for implementation. Please send comments to:
Geoplot Computer Projects
PO Box 46173, station G
Vancouver, BC Canada V6R 4G5